Tuple Merging in Probabilistic Databases

نویسندگان

Fabian Panse

Norbert Ritter

چکیده

Real-world data are often uncertain and incomplete. In probabilistic relational data models uncertainty can be modeled on two levels. First by representing the uncertain instance of a tuple by a set of possible instances and second by assigning each tuple with its degree of membership to the considered relation. To overcome incompleteness, data from multiple sources need to be combined. In order to combine data from autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There has been only less attention on the integration of uncertain (esp. probabilistic) source data so far. In this paper, we consider probabilistic tuple merging being an essential step in the integration of probabilistic data. We present techniques for merging uncertain instance data as well as for merging different degrees of tuple membership.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Making massive probabilistic databases practical

Existence of incomplete and imprecise data has moved the database paradigm from deterministic to probabilistic information. Probabilistic databases contain tuples that may or may not exist with some probability. As a result, the number of possible deterministic databases that can be instances of a probabilistic database grows exponentially with the number of probabilistic tuples. In this paper,...

متن کامل

Symmetry in Probabilistic Databases

Researchers in databases, AI, and machine learning, have all proposed representations of probability distributions over relational databases (possible worlds). In a tuple-independent probabilistic database, the possible worlds all have distinct probabilities, because the tuple probabilities are distinct. In AI and machine learning, however, one typically learns highly symmetric distributions, w...

متن کامل

On Entity Resolution for Probabilistic Data

Entity resolution (ER) is the problem of identifying duplicate tuples, which are the tuples that represent the same real-world entity. There are many real-life applications in which the ER problem arises. These applications range from news aggregation websites, identifying the news that cover the same story, in order to avoid presenting one story several times to the user, to the integration of...

متن کامل

A Probabilistic NF2 Relational Algebra for Imprecision in Databases

We present a probabilistic data model which is based on relations in non-rst-normal-form (NF2). Here, tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. This way, imprecise attribute values are modelled as a probabilistic subrelation. For information retrieval, the set of weighted index terms of a document can be represented in the same way, thu...

متن کامل

Sensitivity Analysis and Explanations for Robust Query Evaluation in Probabilistic Databases

Probabilistic database systems have successfully established themselves as a tool for managing uncertain data. However, much of the research in this area has focused on efficient query evaluation and has largely ignored two key issues that commonly arise in uncertain data management: First, how to provide explanations for query results, e.g., “Why is this tuple in my result ?” or “Why does this...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Tuple Merging in Probabilistic Databases

نویسندگان

چکیده

منابع مشابه

Making massive probabilistic databases practical

Symmetry in Probabilistic Databases

On Entity Resolution for Probabilistic Data

A Probabilistic NF2 Relational Algebra for Imprecision in Databases

Sensitivity Analysis and Explanations for Robust Query Evaluation in Probabilistic Databases

عنوان ژورنال:

اشتراک گذاری